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Forward flux sampling (FFS) provides a convenient and efficient way to simulate rare events in 
equilibrium or non-equilibrium systems. FFS ratchets the system from an initial state to a final 
state via a series of interfaces in phase space. The efficiency of FFS depends sensitively on the 
positions of the interfaces. We present two alternative methods for placing interfaces automati- 
cally and adaptively in their optimal locations, on-the-fly as an FFS simulation progresses, without 
prior knowledge or user intervention. These methods allow the FFS simulation to advance efficiently 
through bottlenecks in phase space by placing more interfaces where the probability of advancement 
is lower. The methods are demonstrated both for a single-particle test problem and for the crystal- 
lization of Yukawa particles. By removing the need for manual interface placement, our methods 
both facilitate the setting up of FFS simulations and improve their performance, especially for rare 
events which involve complex trajectories through phase space, with many bottlenecks. 



I. INTRODUCTION 

Many important processes in nature can be described 
as rare events - i.e. events that happen rapidly but 
unpredictably, with long waiting times between occur- 
rences. Examples of such processes range from large- 
scale problems such as electricity or computer network 
failures, to molecular level processes such as the forma- 
tion of crystal nuclei or vapour bubbles in metastable 
liquids. Rare events are difficult to study, either in ex- 
periments or in computer simulations, because most of 
the observation time is spent waiting for the fluctuation- 
driven event to happen. In simulations, this problem 
can be overcome using rare event simulation techniques 
such as umbrella sampling [HH], Bennett-Chandler meth- 
ods [SHI], transition path sampling [5H7], transition inter- 
face sampling [8 -10 , milestoning [1U - TT5] . nudged elastic 
band [HI IT5] , string methods |16l IT7] , weighted-ensemble 
methods |18j , non-equilibrium umbrella sampling or for- 
ward flux sampling (or splitting)-type methods [ISH2H]. 
All of these methods aim to enhance the sampling in 
the region of phase space (or trajectory space) that cor- 
responds to the rare event, while reducing the amount 
of time the simulation spends in the uninteresting phase 
space (or trajectory space) regions corresponding to the 
waiting times. 

In this paper, we focus on the forward flux sampling 
(FFS) approach (25). In FFS, one uses an order parame- 
ter to measure the progress of the system from the initial 
state towards the final state. The region of phase space 
between the initial and final states is partitioned by a 
series of interfaces defined by specific values of the or- 
der parameter. These interfaces are used to ratchet the 
system from the initial to the final state. Short trajec- 
tories are fired from the initial state; if these reach the 
first interface, they are used as starting points for further 
trajectories, which, if they reach the second interface, 
are used as starting points for further trajectories, etc. 
During this procedure, the fraction of trajectories which 
reach the next interface is monitored. The product of 



these "success probabilities" over all interfaces, together 
with the flux of trajectories out of the initial state, gives 
the transition rate from initial to final state. Unbiased 
transition trajectories can be reconstructed from the col- 
lection of trajectories between interfaces. FFS provides 
a convenient way to simulate rare events in stochastic 
dynamical systems, because it is rather simple to im- 
plement and allows direct calculation of the transition 
rate. Importantly, FFS is suitable for both equilibrium 
and non-equilibrium systems (since it does not require a 
priori knowledge of the phase space density) Re- 
cent advances in FFS-type methods include the devel- 
opment of different algorithms for the trajectory-firing 
procedure [20., computation of phase-space densities as 
well as transition rates [29J, analysis and optimization 
of the efficiency of the method [2TJ |SJJ [H [55J [SSJ , and 
the development of FFS-like methods for systems which 
are out of the stationary state |30[ IB"!]. While FFS is of 
course not a panacea for all rare event problems [521 [55] , 
it is widely and successfully used for a range of systems, 
some of which would be difficult or impossible to tackle 
with other methods. The validity of FFS has been ex- 
tensively tested against brute force simulations and other 
rare event simulation methods for a range of problems 

[m [MrET] 

In this paper, we focus on the placement of interfaces 
in FFS. The number of interfaces and their positions are 
important inputs in FFS, since poor interface placement 
can have strongly detrimental effects on the efficiency 
[211 [55] , If the interfaces are placed too far apart the 
probability of reaching the next interface will be very low, 
and much effort will be wasted firing trajectories that fail 
to progress. On the other hand, if the interfaces are too 
close together, trajectories will be highly correlated be- 
tween successive interfaces so that little new information 
is gained at each interface. 

Borrero and Escobedo [381 [59"] have shown that, for a 
fixed number of interfaces, the efficiency of FFS simula- 
tions is optimized when the flux of trajectories between 
interfaces is equalized: i.e., for a fixed number of trial 
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trajectories per interface, the probability of reaching the 
next interface should be equal for all interfaces. This cri- 
terion allows optimal interface placement, if one has prior 
knowledge of how the success probabilities depend on the 
order parameter. Such knowledge is, however, not usu- 
ally available. Borrero and Escobedo suggest beginning 
with non-optimized interfaces and using the constant flux 
criterion to iteratively improve the interface placement in 
successive FFS runs [35] [35]. While this is a good strategy 
for some problems, it is problematic for computationally 
expensive systems with high barriers. For these systems, 
FFS simulations with poorly chosen interface sets simply 
will not finish in a reasonable computational time. This 
forces the user to spend much effort on finding a rea- 
sonable initial interface set, by manual trial-and-error. 
Moreover, repeating the FFS simulations to obtain itera- 
tively better interface sets is computationally expensive. 
To our knowledge, the only interface-based method that 
does not require a priori interface placement is adap- 
tive multi-level splitting (AMS) [40 . In this method, 
successive interfaces are placed adaptively, based on the 
furthest point in order parameter space reached by pre- 
vious trajectories. Practical implementation of AMS is, 
however, more complex than for standard FFS, because 
one needs to keep track of the histories of previous tra- 
jectories in order to determine the start points of new 
trajectories. This is likely to involve coding and storage 
overheads, particularly for large systems. 

In this paper, we present two methods which allow 
optimal placement of interfaces in standard FFS simula- 
tions, on-the-fly, without any prior knowledge. We first 
use theoretical arguments to estimate the optimal range 
for the flux between interfaces, when the number of in- 
terfaces is not fixed (section [TTJ) . In section III we present 
our algorithms and discuss the situations in which we 
expect each to be advantageous. We demonstrate the 
performance of both methods in section |IV| first for a 
one-particle test problem and then for a computationally 
expensive rare event problem: crystallization in a system 
of Yukawa particles. Finally we present our conclusions 
in section Iv] 




Figure 1: Schematic illustration of the DFFS algorithm. The 
barrier region between A a and As is partitioned by a series 
of interfaces, defined as values of the order parameter A (hor- 
izontal axis). The black dots denote stored configurations; 
and the coloured arrows (colour coded by interface) repre- 
sent trajectories. The vertical axis denotes simulation time, 
increasing downwards. 



A. The direct FFS algorithm (DFFS) 

The aim of FFS is to compute the transition rate kAB 
from an initial state A to a final state B, while at the 
same time sampling the associated transition trajecto- 
ries. The transition rate Uab is given by kAB = &Pb 
[S], where $ is the flux of trajectories leaving the initial 
state, and Pg is the probability that a trajectory that 
leaves the initial state will subsequently make it to the 
final state (rather than returning to the initial state). In 
FFS, the initial and final states are defined in terms of 
an order parameter A, such that if A < A^ the system is 
in the initial state and if A > As it is in the final state. 
Intermediate values of A (Xa < X < Xb) correspond to 
the "barrier" region. This barrier region is partitioned by 
a series of n interfaces, defined by specific values of A, 
such that Xi < Aj+i, Ao = A^ and A n = Xb ( see Figure 
[I]). The probability Pg can be written as [S] 



Pp. 



II. OPTIMIZATION PRINCIPLES FOR 
INTERFACE PLACEMENT IN FFS 



In this section, we first briefly describe the "direct" 
FFS algorithm. We then review the work of Borrero and 
Escobedo which shows that, for a fixed number of in- 
terfaces, optimal efficiency requires equal fluxes between 
interfaces |38l 155], Building on this work, we establish 
the optimal range for the transition probability between 
interfaces, in the case where the number of interfaces is 
not constrained. This optimal range will be used as input 
for the computational algorithms described in section [TTT| 



lift 

i=0 



(1) 



where pi is the conditional probability that the sys- 
tem, having reached interface i, subsequently goes on 
to reach interface i + 1 before returning to the initial 
state. FFS provides a practical and efficient way to com- 
pute and pi for each interface, thus allowing the com- 
putation of kAB- The algorithm also generates transi- 
tion trajectories. While several variants of FFS exist 
US E3 EE], we focus here on the direct FFS algorithm 
(DFFS) [ig EH EH]- 

The DFFS algorithm has two stages. In the first stage, 
the flux $ across the first interface Aq is computed by sim- 
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ulating a system in the initial state and monitoring the 
frequency with which the trajectory crosses Ao in the di- 
rection of increasing A. When these crossings happen, the 
configuration of the system is stored; this simulation thus 
generates not only a measurement of $ but also a collec- 
tion of No configurations corresponding to states of the 
system at the moments of crossing Ao [49 . In the second 
stage of the algorithm, the probabilities pi are computed 
in a step- wise fashion (see Figure [l] for illustration). To 
compute pi . one chooses configurations at random from 
the collection stored at Ao and uses them to initiate new 
"trial" trajectories, which are continued until they either 
reach Ai ("success") or return to Ao ("failure"). For suc- 
cessful trajectories, the final configuration at Ai is stored 
in a new collection. After M trial trajectories have been 
fired, pi is computed by dividing the number of successes 
by Mq. One then repeats the same procedure, using the 
configurations at Ai as starting points for Mi trajectories 
that are continued until they reach A2 or return to Ao, 
and so on, until the final interface is reached and one has 
a complete set of estimated probabilities pi. Transition 
trajectories from the initial to the final state can then 
be reconstructed from the set of successful trajectories 
between interfaces [2D1 122] . 



in "bottleneck" regions of the phase space (note that here 
Borrero and Escobedo assume that, since the number of 
interfaces is fixed, the computational cost does not de- 
pend strongly on the interface placement, and does not 
need to be considered in the optimization). An alterna- 
tive formulation of the constant flux rule, put forward by 
Borrero and Escobedo, states that the quantity 



E"=o llo g^ 



(3) 



should be linear when plotted against the interface index 
i. To see this we note that if all the transition probabili- 
ties are equal, pj = p = const, then 



fi 



E?=o llo sp 



(4) 



On can therefore measure the "quality" of a particular set 
of interfaces, either by directly asking whether the success 
probability pi is the same across different interfaces, or 
by testing whether fi is linear when plotted against the 
interface number i. 



B. Equalization of fluxes between interfaces 



C. Optimal transition probability 



Under the assumption that trajectories decorrelate be- 
tween adjacent interfaces, analytic results for the com- 
putational efficiency of the DFFS method (and related 
methods) can be derived [2T|. Even though these as- 
sumptions may not always be satisfied for many real 
problems, these analytical predictions still give a useful 
general guide to the performance of the method. In par- 
ticular, by modelling the number of successful trajecto- 
ries from interface i as a binomially distributed random 
variable with parameter Pi, one can obtain predictions for 
the computational cost, and the statistical error, associ- 
ated with the computation of the rate constant Uab for 
given choices of the number n of interfaces, the numbers 
Mi of trial trajectories, the number N of configurations 
at Ao, and for given values of pi [2T]. For DFFS, the 
variance V in the estimated rate constant is given ap- 
proximately bv [2T|l22] 



V 



E 

i=0 



(2) 



Borrero and Escobedo PES EOS have shown that, for fixed 
n, {Mi} and Pb, Eq. ^ can be minimized by placing the 
interfaces such that MiPi is the same for all interfaces - 
i.e. the statistical error is smallest when the net flux 
of trajectories between successive interfaces is constant. 
Assuming, for simplicity, that one fires the same number 
of trajectories for each interface (Mj = M = const), one 
should place the interfaces such that pi is the same for 
all interfaces. Thus, interfaces should be closer together 
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Figure 2: Theoretical prediction (Eq.( A9 1) for the efficiency £ 
of a hypothetical rare event problem with M = 200 and No = 
100, plotted as a function of p for several values of Pb and of 
the cost parameters R and S (see text and Appendi x [A 1. The 
insets show the predicted computational cost C (Eq.( A6 1) and 



variance in the rate constant V (Eq.(A8l, which follows from 
Eq.§). 



Borrero and Escobedo's work shows that for n inter- 

— p = PJ 



B 



faces, the optimal positioning is such that pi 
(this follows from Eq. (JTJ). However, if we are to place 
interfaces optimally, on-the-fly, we also wish to optimize 
the number n of interfaces. This is equivalent to optimiz- 
ing the crossing probability p, under the constraint that 
n = log Pb/ log p. Here we compute the optimal value of 
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p, and we find that the efficiency is rather insensitive to 
p over a broad range of parameter values. 

In optimizing the efficiency with respect to p we need 
to consider both computational cost and statistical error, 
since we expect both to depend on p. Following Ref. 
|21j . we define the computational efficiency of a DFFS 
calculation as 



where V measures the statistical error, as in Eq.Q, andC 
is the computational cost of the calculation. An approx- 
imate expression for C in terms of p is given in Appendix 
[A] where we also rewrite Eq. Q for V in terms of p and 
Pb- Combining these expressions allows us to write an 
approximate analytical expression for the efficiency £, as 
a function of p. This expression is given in Appendix [X] 
it depends only on p, Pb, M and two constants R and S 
which measure the cost of generating a single configura- 
tion at Ao, and the cost of a trajectory from Xa to Xb (see 
Appendix |A|) . Figure [2] shows the function £(p), plotted 
for a hypothetical rare event problem in which M = 200 
and N = 100, for several values of Pg, R and S. £(p) 
is non-monotonic, with a peak at the optimal value of 
p. This non-monotonicity arises from contrasting trends 
in the computational cost and the statistical error (see 
insets to Figure |2| : while the cost increases with p (be- 
cause increasing p implies more interfaces, and thus more 
trajectories), the statistical error decreases with p (since 
more interfaces implies more accurate measurement of 

kAB)- 

Encouragingly, for all the parameter combinations 
tested, the peak in £{p) in Figure [2] is rather broad, 
suggesting that one may achieve high computational ef- 
ficiency without needing to control the interface cross- 
ing probability too precisely. For low values of p (less 
than about 0.2) the efficiency does, however, decrease 
drastically. Thus our calculations suggest that placing 
the interfaces such that the success probability is larger 
than about 0.3 should generally result in high computa- 
tional efficiency. However, there is of course an upper 
limit on the value of p that is sensible. For very high 
crossing probabilities, the interfaces become very close 
together and trajectories between successive interfaces 
become correlated - a factor that is not taken into ac- 
count in our theoretical analysis. This is likely to com- 
promise efficiency because correlated measurements at 
closely spaced interfaces incur computational overheads 
but provide little extra information. Taking this into ac- 
count, choosing a value of p in the range 0.3 < p < 0.7 
appears to be a sensible rule of thumb for most rare event 
simulation problems. 

III. ALGORITHMS FOR AUTOMATIC 
ON-THE-FLY INTERFACE PLACEMENT 

We now present two algorithms which automatically 
position the interfaces during a DFFS simulation so as 
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Figure 3: Schematic illustration of the trial interface method. 
The probability of reaching the trial interface Atrial is es- 
timated by firing a small number of trial runs from the 
current interface Ai. The position of the trial interface is 
accepted if the estimated probability p est is in the range 
Pest £ bminjPmaxJ- Otherwise, the trial interface is shifted 
according to Eq.((6| 



to achieve a desired value of the success probability p. 
At the beginning of the simulation, the positions of the 
interfaces are not defined: the user specifies only the 
boundaries of the initial and final states (A^ and Xb) and 
the desired value (or range) for p, as well as a minimal 
distance between interfaces so as to avoid correlations. 
Starting from A ,4 = Ao, the algorithms place interfaces 
on-the-fly - i.e. first the optimal value of Ai is deter- 
mined, then trajectories are fired to Ai, then A2 is op- 
timized, then trajectories are fired to A2, etc. In these 
algorithms, the number n of interfaces adapts automati- 
cally to the choice of p. 

In order to place Ai + i optimally, given that the simu- 
lation has arrived at Ai , one needs to make an estimate of 
how the transition probability pi depends on Aj+i. Both 
algorithms achieve this by firing a small number of "ex- 
ploratory" trajectories from A.;; the difference between 
the two algorithms lies in the way that the information 
from these exploratory trajectories is used. 



A. Trial interface method 

In the "trial interface" method we position interfaces 
such that the transition probability p lies within user- 
defined acceptable bounds p £ [p m iu,Pmax]- To achieve 
this, we choose a trial position Atrial for the next interface, 
obtain an estimated transition probability p cst for this 
trial interface, and then, based on this information, shift 
the trial interface until p cst lies within the range p est € 
[PminjPmax]- Assuming that the DFFS simulation has 
reached interface A^, the algorithm proceeds as follows 
(see also Figure |3|: 
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1. Choose a trial position Atrial for the next interface 
Aj+i, in the range Aj < Atrial < Xb- This should 
be done in a way appropriate to the problem being 
studied; we typically set Atrial = Aj + b x (As — Xa), 
where 0.01 < b < 0.1, but one could also use for 
example Atrial = Aj + (Aj - Aj_i). . 

2. Using as starting points the configurations stored 
at Aj, fire Aftriai trajectories, which are continued 
until they reach Atrial or A^. M tr iai should be sig- 
nificantly smaller than the typical number of M 
trajectories per interface in the complete FFS sim- 
ulation. 

3. Compute p es t = iVg/JWtriai where Ns is the number 
of trial trajectories that reached Atrial- 

4. If p m i n < Pest < Pmaxi accept the trial interface. 
Otherwise, choose a new trial interface position ac- 
cording to 

Atrial, now — Au ial, old 

+ A s topAp (6) 

where 

a J \Pest -Pmax) if Pest > P max /«\ 

Ap = \i \ -t ( 7 ) 

[ (Post - Pmin) II Post < Pmin 

and fire trial trajectories to obtain a new p est for 
this trial interface. Repeat this procedure until 
Pest lies within the desired range. If the resulting 

value Atrial, now < A-j + G? m in, Set Atrial = Aj + d m i n , 

where d mnl is the user-defined minimal acceptable 
distance between interfaces. If Atrial, new > Xb set 

Atrial = Xg. 

5. Set Aj+l = Atrial- 

6. Continue with the DFFS simulation - i.e. fire M 
trajectories to Aj + i to compute p 4 and obtain a new 
collection of configurations at Aj+i, as in the stan- 
dard DFFS procedure. Any trajectories previously 
fired to this interface during step 5 can be included 
in the estimate of pj . 

In this algorithm, in addition to p m i n and p ma x, the 
user-defined parameters are the stepwidth A s t ep (which 
determines how far the interface is shifted in each ad- 
justment step), Mt r iai, the number of trial trajectories 
used to obtain p est (typically M tria i « 15), and d min , 
the minimal acceptable spacing between interfaces. This 
latter parameter is introduced to avoid excessive correla- 
tion between the sampling at successive interfaces, even 
if Pmin is chosen to be small. The choice of d m i n depends 
on the choice of order parameter and the dynamics of the 
system being studied. For example, if the order parame- 
ter is discrete, d m i n should be at least one. In continuous 
systems, it should prevent the system from being able to 
cross several interfaces in a single timestep, and should be 
larger for systems whose dynamics is slow to decorrelate. 
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Figure 4: Schematic illustration of the exploring scouts 
method. A pre-defined number of trial trajectories are 
launched from the current interface Xi. These trajectories 
continue until they reach the initial state A or the final state 
B, or until the maximum number of steps is reached. The 
maximum values of A reached by the trial trajectories are then 
used to determine the position Ai+i of the next interface. 



The choice of interface shifting rule (point 4 in the al- 
gorithm described above) is not unique. We expect this 
rule to work well for systems with steep energy barri- 
ers, where one needs the initial interfaces to be closely 
spaced. However, for systems with flatter barriers, one 
might prefer to use a bisectional scheme, in which the 
trial interface is initially placed midway between Aj and 
Xb, and is then shifted forwards or backwards by bisect- 
ing the space between itself and either A j or Xb- 

The trial interface method is conceptually simple and 
can be implemented with only very minor modifications 
to an existing DFFS simulation code. The method also 
has the advantages that estimated transition probabili- 
ties for several possible trial interface positions can be 
computed in parallel on separate processors, and that 
any trial trajectories fired to interfaces that are eventu- 
ally accepted can be reused in the final calculation of pj. 
The method does, however, have the potential drawback 
that it relies on a reasonably good first estimate of Atrial: 
if this first estimate is very poor, the algorithm may take 
many iterations to find an acceptable interface position. 
This problem is avoided in our second approach, the "ex- 
ploring scouts" method. 



B. Exploring scouts method 

In the "exploring scouts" method, we again fire M tr i a i 
trial trajectories from interface Aj , but this time without 
defining a trial interface position. In this method, illus- 
trated in Figure [4] the trial trajectories are continued 
until they reach either Xb or A^, or until a user-defined 
maximum number of steps is exceeded. The maximum 
value of A achieved by each trial trajectory is monitored, 
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and the distribution of these values is used to position the 
next interface such that the success probability is close 
to a user-defined desired value p^es ■ The exploring scouts 
algorithm proceeds as follows: 

1. Fire M tr i a i trial trajectories from interface A^, start- 
ing from the configurations generated by the DFFS 
algorithm. Continue each trajectory until it either 
reaches A^ or A^, or exceeds m max steps. Record 
the maximum value of A achieved in each trial tra- 
jectory. 

2. Generate a ranked list of maximum A values for 

all trial trajectories - i.e. assign each trajectory 

an index k in the range < k < Mtriai j such that 
,(fc) ,(fc+i) 

^max ^ ^max • 

3. Compute fcd os = L-^triai(l — Pdes)J and set the po- 
sition of the next interface Ai+i = Amax ■ If the re- 
sulting value A i+ i < X. t + d min , set X t+1 = X. t + d min 
(where d m i n is the minimal acceptable spacing be- 
tween interfaces as in the trial interface method). 

4. Continue with the DFFS simulation - i.e. fire M 
trajectories to A^ + i to compute pi and obtain a new 
collection of configurations at Aj+i, as in the stan- 
dard DFFS procedure. 

This algorithm works because the trial trajectories, or 
"exploring scouts", supply information on the probability 
of reaching a particular value of A, for all A in the range 
Aj — > Xb- For entry k in our ranked list, k exploring 

(k) 

scouts failed to reach Amax and A/trial — k scouts reached 
Amlx or beyond (note k runs from zero to M tria i — 1). 

The transition probability for an interface placed at Amax 
would therefore be approximately (Af tr i a i — fc)/M tr i a i. We 
can obtain a next interface position Aj+i corresponding 
approximately to our desired transition probability p^ cs 
simply by picking the |_Aftrial(l - Pdes)J-th entry in our 
list of maximal A values. More precise versions of this al- 
gorithm are of course possible (e.g. interpolating between 
Amlx values in our list). However, because the efficiency 
is in general not very sensitive to the precise value of p, 
we do not find these to be necessary. 

The user-defined parameters for this method are the 
target probability Pdes> the number Af tr i a i of exploring 
scouts, the minimal interface spacing d m in and the limit 
m max on the number of simulation steps per trial trajec- 
tory. If TO m ax is set too low, the algorithm will fail to 
explore regions of larger A, and may tend to place the in- 
terfaces too close together (i.e. the true p will be smaller 
than Pdes)- Choosing a large value of m max will, however 
make the algorithm more computationally expensive. 

The exploring scouts method has the advantage that 
one knows a priori how many trial trajectories will be 
required to set the next interface position - this may be 
important in parallelized FFS applications. Furthermore, 
the number of user-defined parameters is fewer than in 
the trial interface method. The exploring scouts method 



requires slightly more modifications to an existing stan- 
dard DFFS code than the trial interface method, since 
one needs to track the maximal values of A for the trial 
trajectories, but it is nevertheless rather simple to imple- 
ment. 



IV. EXAMPLES 

We now demonstrate our interface placement methods 
for two test problems. First, we study the toy problem of 
a single particle undergoing Langevin dynamics in a one- 
dimensional potential; this also provides an opportunity 
to test the predictions for the computational efficiency 
made in section |II C| Next, we demonstrate the utility 
of the methods for the much more computationally de- 
manding example of crystal nucleation in a system of 
particles interacting via a Yukawa potential. 

A. A single particle in a one-dimensional potential 

We first consider a single particle moving in one di- 
mension, in a potential with two minima, defined by 
V(x) = (h/2) [1 — cos (kx)] for x in the range [—1,3]. 
The height of the potential barrier, at x = 1, is h = 
\2kgT. The particle, which is initially placed in the re- 
gion —0.2 < x < 0.2, undergoes underdamped Langevin 
dynamics. We set fc^T = 1, m = 1, dt = 0.001 and 
the friction coefficient 7 = 1; with these parameters the 
crossover between ballistic and diffusive motion occurs 
on a timescale of about 1000 time steps or a dimension- 
less distance of 1. Our reaction coordinate A is taken 
to be the position x of the particle and the borders of 
the initial and final states are defined by A^ = 0.2 and 
A B = 2. 

We carry out DFFS simulations for this problem, using 
both the trial interface method and the exploring scouts 
method. For both methods, we set Af t riai = 100, M — 
1000, N a = 3000 and d m in = 0.01. In the trial interface 
method, we set p m in = 0.4, p m ax = 0.6 and the initial trial 
position for interface A^ + i is chosen to be Ai+0.1(As — Aj). 
In the exploring scouts method, we set pdes = 0.5 and 

TOmax = 10 5 . 

Figure [5] shows the positions of the interfaces (main 
plot), and the resulting success probabilities pi (inset), 
for the trial interface and exploring scouts methods. Both 
methods produce probabilities pi that are approximately 
uniform across the interfaces, as desired. This corre- 
sponds to a highly non-uniform interface spacing: in fact 
all the interfaces are located prior to the maximum of 
the potential barrier, with the final interface lying close 
to the maximum. Figure [6] shows the quantity /j, defined 
in Eq.Q, for both methods. This is indeed close to lin- 
ear, confirming that the interface placement is close to 
optimal. 

These methods allow us to choose the success proba- 
bility p and place interfaces accordingly. We can there- 
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Figure 5: Single particle test case: positions Xi of the inter- 
faces as a function of interface index i, for the trial interface 
method (squares) and the exploring scouts method (circles). 
The maximum of the potential barrier is at X = 1.0, shown by 
the solid horizontal line. The inset shows the success proba- 
bilities pi, plotted as a function of interface index i. 
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Figure 6: Single particle test case: fi, as defined in Eq. 
Q, as a function of interface index i, for the trial interface 
method (squares) and the exploring scouts method (circles). 
The dashed lines show the optimal case where fi = i/n (see 
Eq. Q). Note, that the two methods give slightly different 
numbers of interfaces. 



fore use them to test the theoretical predictions made in 
section |II C| for the dependence of the computational ef- 
ficiency on p. To this end, we have used the exploring 
scouts method to carry out a series of DFFS simulations 
for the single particle test problem, with the transition 
probability p varying between 0.05 and 0.95. The param- 
eters of the method were as above, but with M = 3000. 
In these simulations, we measured the computational cost 
(in simulation steps) and the statistical error in the com- 
puted rate constant, allowing us to compute the compu- 
tational efficiency £ as defined by Eq.(|5|. Figure [7] shows 
the measured computational efficiency, as a function of 
the transition probability p, compared to the theoreti- 
cal prediction. The latter was computed using Eq.( |A9| , 
with Pb = 1.36 x 10~ 5 (the result obtained from our sim- 
ulations), R = 1.60 x 10 7 and S = 1.39 x 10 7 (both in 
simulation steps, and obtained by fitting the cost func- 



Figure 7: Single particle test case: computational efficiency 
£ as defined by Eq.|5]l, as a function of the success probabil- 
ity p. The data points show results of simulations using the 
trial interface method (for parameters see main text). The 
solid line shows the theoretical prediction of Eq. ( A9 1 with 
P B = 1.36 x 10~ 5 , R = 1.60 x 10 7 and S = 1.39 x 10 7 (see 
main text). The number of interfaces placed by the algorithm 
varied between 3 (for p — 0.05) and 671 (for p = 0.95). 



in remarkably good agreement with our theoretical pre- 
dictions, showing that the estimated optimal values of p 
obtained from the theory are indeed valid, at least for 
this problem. Taking the error bars into account, the 
value of Pb obtained in the our simulations is indepen- 
dent of p, justifying the use of p as a performance tuning 
parameter and showing that the FFS method remains 
valid regardless of the number of interfaces (which varies 
between 3 and 671 as p is varied between 0.05 and 0.95). 



B. Crystallization of Yukawa particles 

We now move on to a much more challenging test 
problem: crystal nucleation in a system of particles in- 
teracting via a combined Yukawa and Weeks-Chandler- 
Andersen (WCA) potential [41J, U (r) = £/ Yukawa (r) + 
Uwca(t) with 



U 



Yukawa 



exp(-K(r/(T - 1)) 

.( r ) = 6 / 

rja 



and 



Uwcx(r) = 




r < as 
else. 



(8) 



(9) 



tion (A6l to our simulation data). The simulations are 



The repulsive WCA potential is used to model the ex- 
cluded volume of the particles (note that the energy 
scale for the WCA potential is set to fc#T = 1 in our 
simulation units). The Yukawa potential is a screened 
Coulomb potential, suitable for modeling charged parti- 
cles whose electrostatic interactions are screened by sur- 
rounding ionic atmospheres. In this work, the parameters 
of the Yukawa potential are the value of the repulsive po- 
tential at contact e = 8 (in units of ksT) and the inverse 
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screening length n = 5 (in terms of the hard-sphere di- 
ameter a) . Despite important previous advances |42[ I43| , 
the mechanism by which crystal nucleation happens in 
screened Coulomb systems remains an open question, to 
which FFS simulations can contribute by providing both 
nucleation rates and transition paths [42J. However, be- 
cause the Yukawa interaction requires a larger cutoff ra- 
dius than the more widely studied Lennard-Jones inter- 
action, simulations of Yukawa particles are computation- 
ally expensive (especially for low salt conditions), which 
means that the number of trial trajectories which can be 
performed in an FFS simulation is limited. This makes 
setting up standard FFS simulations difficult, particu- 
larly under interesting conditions, e.g. close to coexis- 
tence where the transition rate is expected to be low |U] . 
Under these conditions, manual placing of the interfaces 
can easily lead to conditions where no FFS trial trajec- 
tories succeed in reaching the next interface. For such 
systems, automatic, optimal interface placement has the 
potential greatly to improve the feasibility and computa- 
tional efficiency of FFS simulations. 

We performed molecular dynamics (MD) simulations 
of 4096 WCA- Yukawa particles in a cubic box with pe- 
riodic boundary conditions in the NPT ensemble at con- 
stant pressure P = 38 (LJ units) with a Langevin ther- 
mostat using the software package ESPResSo |3S] in 
combination with DFFS, implemented in our rare event 
sampling framework FRESHS g3]. Note that FFS re- 
quires stochastic dynamics: here this is provided by the 
Langevin thermostat. The system is initially prepared 
in the liquid phase, which is undercooled (and therefore 
metastable). We are interested in the transition to the 
stable FCC crystal phase. Our order parameter A is the 
size of the largest cluster of solid particles, where par- 
ticles are classified as solid or liquid based on the local 
56 order parameter, as used in previous work [47 . The 
boundaries of the initial and final states were fixed such 
that the system is in the initial state if less than 0.5% 
of the particles are in the largest solid cluster and in the 
final state if more than 90% of the system's particles are 
in the largest solid cluster. This corresponds to A^ = 15 
and Xb = 3700. 

In our DFFS simulations, we compared three methods 
for interface placement: (i) placing the interfaces man- 
ually via a logarithmic scheme, (ii) the trial interface 
method and (iii) the exploring scouts method. All our 
DFFS simulations used No = 80 configurations at the 
first interface and M = 50 trial runs per interface. Here, 
we discuss only the performance of the interface place- 
ment methods; the nucleation rates and pathways gener- 
ated in the simulations will be presented elsewhere |48) . 

We first discuss the manual interface placement. For 
nucleation problems, where simulations are computation- 
ally very expensive, manual interface placement in FFS 
is very challenging. Our problem has a steep free en- 
ergy barrier and so placing interfaces evenly between Xa 
and Xb results in very poor success rates for early in- 
terfaces. In fact, for our problem, we did not obtain any 
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Figure 8: Yukawa test case: interface positions Xi (plotted 
as a function of interface index i) generated by the manual 
interface placement (open squares), the trial interface method 
(open triangles) and the exploring scouts method (circles). 



successes for early interfaces even with 100 evenly spaced 
interfaces. Therefore, as a "best possible" manual choice, 
based on this prior knowledge, we placed 36 interfaces 
logarithmically between A^ and A^, with closer spacing 
between the early interfaces. Even with this rather well- 
informed choice of interfaces, Figure [9ja) shows that we 
obtain success probabilities that are far from equal across 
interfaces (inset). Indeed, many of the pi values are very 
low: this is because the free energy landscape contains 
unforseen bottleneck regions, in which too few interfaces 
were placed. Because the success probabilities are low in 
these bottleneck regions, much computational effort will 
be wasted on failed trajectories. Another problem is also 
apparent: for later interfaces, the transition probabilities 
are extremely high (close to 1). In this region of the free 
energy landscape, the crystal grows spontaneously: the 
placement of unnecessary interfaces implies extra com- 
putational overhead in storing configurations, etc. The 
fact that the manual interface placement is far from op- 
timal is also apparent in the highly non-linear form of 
the function fi when plotted against the interface index 
i (main plot in Fig. |9ja)). 

We note that a commonly used approach to manual 
interface placement in FFS is to start with some initial 
guess, then if one obtains no successes for a given inter- 
face, shift it to a lower A value and continue the FFS sim- 
ulation. If not done carefully, this can actually bias the 
resulting computation of the rate constant Uab towards 
higher values, since for interfaces at which by chance one 
obtains a large number of successes, one makes no change, 
but for interfaces where by chance one obtains few suc- 
cesses one shifts the interface. If such a shifting approach 
is used, bias can only avoided by repeating the entire FFS 
simulation a posteriori - i.e. after the interface positions 
have been fixed. 

Figures [8] and [9] also show the results of the automatic 
interface placement methods. For both methods, we set 
^min = 3 and M-triai = 8. The value of d m - ln was chosen to 
prevent the system from crossing several interfaces in one 
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Figure 9: Yukawa test case: optimization criteria for the in- 
terface sets generated manually (a), with the trial interface 
method (b) and with the exploring scouts method (c) (for 
parameter values see main text). The main plots show the 
function /; of Eq. (pp, plotted against the interface index i, 
from our simulations (symbols) - the dashed lines show the 
optimal case, Eq. Q. The insets show the success probabili- 
ties pi plotted against the interface index i. 



Method 


kAB 


Manual placing 
Trial interface 
Exploring scouts 


1 x 10- i4± ^ 
6 x 10- 14±1 

2 x 10" 14±1 



Table I: Yukawa test case: rate constant kAB (in a~ i T~ 1 with 
the simulation time unit r) for DFFS simulations using the 
manual interface placement, trial interface method and ex- 
ploring scouts method. For parameter values, see main text. 
The error bars in kAB were determined by repeated indepen- 
dent simulations. 



Method 


Cost C 


Variance V 


Efficiency £ 


Manual placing 


7 x 10 b 


6648 


lO" 11 


Trial interface 


4 x 10 6 


188 


io- 9 


Exploring scouts 


3 x 10 B 


251 


lO" 9 



Table II: Yukawa test case: computational cost, variance in 
the rate constant and resulting computational efficiency, for 
DFFS simulations using the manual interface placement, trial 
interface method and exploring scouts method. For parameter 
values, see main text. The cost was measured in simulation 
timesteps including the cost of exploratory trial runs for the 
automatic interface placement methods. The variance in the 
rate constant was estimated using Eq. |2j, using simulation 
data for the pi values. 



This suggests that the free energy barrier to nucleation 
is located closer to Xa than to As - once the system has 
passed the barrier, the transition probability is always 
greater than the target value and thus no further inter- 
faces are necessary. However, without a priori knowl- 
edge, there would be no way to guess this when plac- 
ing the interfaces manually. Figure [9] (b) and (c) show 
that indeed both automatic interface placement methods 
perform well according to our optimization criteria: the 
success probabilities pi are much more uniform, with no 
very low pi values (insets). The functions fi are also 
much more linear for the automatic interface placement 
methods than for the manual interface placement (main 
plots) . 



MD timestep (simultaneous attachment of 3 particles in 
one step is unlikely) and to avoid correlation between tra- 
jectories at successive interfaces. For the trial interface 
method, we used p m ; n = 0.3 and p max = 0.6 and the ini- 
tial trial position for A^+i was set at A; + 0.1 (Xb — Xa)- 
For the exploring scouts method, we used pdes = 0.45 
and m max = 10000 timesteps. Figure [8] shows that both 
these methods produce similar interface numbers and po- 
sitions, which are very different from those of our man- 
ual interface placement. The automatic methods position 
the interfaces much closer to the A state: in fact there 
are no interfaces at all for A values greater than 1120. 



An obvious advantage to using the automatic interface 
placement methods is that setting up a DFFS simula- 
tion becomes very much easier and less time-consuming 
than using manual interface placement. In addition, the 
resulting DFFS calculations are more efficient with the 
optimized interface sets. Table |T] shows that the rate 
constants computed using the three interface placement 
methods are equivalent, but the error bars (computed by 
repeated FFS calculations) are larger for the manual in- 
terface placement. Moreover, as shown in Table [TTJ the 
computational cost of the FFS calculation, measured in 
simulation steps, was about a factor of 2 lower for the 
automatically placed interfaces than for those that were 
placed manually. Had we not used our prior knowledge 
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to place the manual interface set logarithmically, this fac- 
tor would have been even greater. For this problem, the 
exploring scouts method required about 25% fewer sim- 
ulation steps than the trial interface method. Table |TT] 
also shows estimates for the statistical errror in the rate 
constant (computed using Eq. Q), and the resulting 
computational efficiency. The estimated computational 
efficiency is two orders of magnitude higher using the au- 
tomatic interface placement methods, compared to the 
manual interface set. 



V. DISCUSSION 

The efficiency of interface-based rare event simulation 
methods such as FFS is strongly dependent on the lo- 
cations of the interfaces. Without a priori information, 
manual interface placement is a "hit and miss" task, that, 
for computationally intensive systems, often involves a 
large amount of user effort and results in non-optimal in- 
terface sets, for which the FFS calculations may be inef- 
ficient. In this paper, we have presented two methods for 
automatically placing interfaces on-the-fly in DFFS sim- 
ulations, so that the user need only choose the order pa- 
rameter, the definitions of the initial and final states, and 
the target transition probability (or its range). Build- 
ing on previous work by Borrero and Escobedo, we have 
analysed theoretically how the computational efficiency 
depends on the interface transition probability p, provid- 
ing an analytical expression for the optimal value of p for 
a given total transition probability Pb . We have further 
shown that in fact this optimum is broad and not very 
sensitive to Pb, so that for most problems target success 
probabilities in the range 0.3 — 0.7 are likely to produce 
satisfactory results. The lower bound of this range is set 
by the fact that efficiency decreases strongly when the 
success probability becomes too low. The upper bound 
is determined by the fact that trajectories will be highly 
correlated at successive interfaces if they are too close, 
meaning that little extra information is gained. 

Our two methods for automatic interface placement 
both work by firing a small number of trial trajecto- 
ries from an existing interface, to determine the posi- 
tion of the next interface. The methods differ in the way 
in which the information from these trial trajectories is 
used. In the trial interface method, a "trial" interface is 
placed, the probability of reaching this interface is esti- 
mated, and the trial interface is shifted until the esti- 
mated probability lies within an acceptable range. This 
method is very simple to implement in an existing DFFS 
code, because the information needed from the trial runs 
(simply whether they succeeded or failed) is the same as 
in a conventional FFS simulation. The interface shifting 
step can easily be parallelized and information from some 
of the trial runs can be re-used in the actual FFS step 
once the interface has been fixed. The exploring scouts 
method is in some ways more sophisticated: here, trial 
runs are fired from the existing interface and the dis- 



tribution of the maximum A values which they reach is 
used to determine a position for the next interface which 
corresponds to the target probability. This method has 
the advantage that one knows a priori how many trial 
runs will be needed to fix the interface position (impor- 
tant in some parallel implementations of FFS) and that 
the maximum length of these trial runs is fixed (albeit 
with some loss of accuracy in the interface position if the 
runs are too short). This may be important for problems 
where trial runs require many computational steps (e.g. 
free energy barriers that are not sharply peaked, or where 
returning to the initial state happens slowly). However, 
implementation of the exploring scouts method requires 
slightly more modifications to existing DFFS codes, since 
one needs to know the maximal value of A reached by the 
trial runs, rather than their success or failure, as in stan- 
dard DFFS. While we did not test it here, one could of 
course combine the trial interface and exploring scouts 
methods within a single DFFS run, for example estimat- 
ing the position of a trial interface using exploring scouts, 
then, in a second step, firing trial runs to the trial inter- 
face to check whether its probability is acceptable. 

We have demonstrated the use of both methods, for a 
simple example of a single particle in a one-dimensional 
potential (where we showed that the computational ef- 
ficiency indeed agrees well with our theoretical predic- 
tions), and for the more realistic example of the crys- 
tallization of Yukawa particles, a computationally inten- 
sive system where the nucleation free energy barrier is a 
priori unknown. In the latter case, automatic interface 
placement led to a large saving in both user and compu- 
tational time, compared to manual interface placement, 
even when the manual placement uses some prior knowl- 
edge of the shape of the free energy landscape. 

The methods presented here should greatly improve 
the feasibility and computational efficiency of DFFS 
simulations for computationally expensive systems. Of 
course, our methods and the approach of Borrero and 
Escobedo [35] are not incompatible: having placed a set 
of interfaces automatically using either the trial interface 
method or the exploring scouts method, one can further 
optimize their placement iteratively via the method of 
Borrero and Escobedo, if necessary. For the rare event 
problem tested here (the crystallization of Yukawa par- 
ticles), we found that this did not result in any further 
improvement. 

Our focus here has been on automatic interface place- 
ment for direct FFS (DFFS) simulations, in which the 
entire ensemble of trajectories is propagated forward in 
order parameter space, one interface at a time. In other 
variants of the FFS method (e.g. branched growth, 
Rosenbluth-like sampling p0ll22tl28]. S-PRES [30] or NS- 
FFS [31]), transition paths from initial to final state are 
instead generated in a one-at-a-time fashion. It should 
be possible to develop modifications of the automatic in- 
terface placement methods for these FFS variants: for 
example applying either the trial interface or exploring 
scouts method to fix the interfaces during the generation 
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of the first transition path. The approaches presented 
here should also be compatible with other interface-based 
rare event simulation methods such as transition inter- 
face sampling [8, 9 . Finally we note that the methods 
described here could be extended to interfaces that de- 
pend on more than one order parameter. For example, in 
the exploring scouts method, one might track the trajec- 
tories of the scouts in two coordinates and set the inter- 
faces to be optimal lines in the space of these coordinates. 
As well as optimising interface placement, this could also 
provide a way to adjust the choice of reaction coordinate, 
on-the-fly during an FFS simulation. 

The methods described in this paper have already been 
implemented in the parallel rare event simulation frame- 
work FRESHS [13], which allows the generic use of both 
FFS and other rare event simulation methods. This 
framework will soon be publicly available as an open- 
source package. 
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Appendix A: Optimal flux calculations 

Here we describe in more detail our theoretical analy- 
sis of the computational efficiency of DFFS, and present 
our analytical expression for the efficiency as a function 
of the transition probability. We assume that the tran- 
sition probability pi — p is the same for all interfaces. 
In contrast to the work of Borrero and Escobedo |38| . 
we do not fix the number n of interfaces. Instead, n is 
determined by the relation 



log Pb 
logp 



(Al) 



where Pb = Y\Pi is the probability that a trajectory 
leaving A reaches B before returning to A. Following 
|21j . we define the computational efficiency as 



1 

CV 



(A2) 



where C and V represent the computational cost of an 
FFS calculation, and the statistical error (variance) in 
the resulting rate constant measurement. We use the 
expressions for C and V derived in [21 J to predict the 
dependence of £ on the transition probability p. 



1. Computational cost 

The computational cost of a DFFS calculation, in sim- 
ulation steps, is approximated in |21| by 



n-l 



C N R + M Ci 



(A3) 



where Nq is the number of configurations stored at A^ , R 
is the cost of generating each of these configurations, M is 
the number of trials per interface (note we have assumed 
this to be constant) and Ci is the average cost of firing 
a trial run from interface i. Note that Eq.(A3l describes 



the total cost of the FFS run rather than the cost per 
configuration at Xa, as in ref. [21 . Simplifying somewhat 
the calculation in [3T], we assume that the cost of a trial 
run is linearly proportional to the number of interfaces 
that it crosses, with proportionality constant S/n (since 
the spacing between interfaces is inversely proportional 
to n: thus S is the cost of a trajectory from A to B). 
Thus a trial run from Ai to A^+i has cost S/n while a run 
from Ai to A^ has cost iS/n. Taking into account the 
relative probabilities of these outcomes gives 



C 4 «-[p + i(l-p)] 
n 

resulting in the following expression for the cost: 
C^N Q R+ — Y,\p + i(l-p)] 



(A4) 



i=i 



Sk 



(A5) 



iZ+ — [2p(n-l) + n(n-l)(l-i>)] 
2n 



where k = M/N . The first result in Eq.(A5l is identical 



to Eq.(A3) in the main text. Substituting in the expres- 



sion for n in terms of Pb we obtain an expression for the 
cost in terms of p and Pb' 



C 



No 



[2R\ogP B \ogp 



(A6) 



21ogplogP B 
+Sk(3 P log P B logp + log P B 2 

p\ogP B 2 ~2\ogp 2 - \ogP B logp)]. 
2. Statistical error 



The statistical error - i.e. the variance in a calculation 
of the rate constant by DFFS, is approximated as in |21| . 

by 



-Pi) 

p l M l 



Setting pi = p and Af$ = M we obtain 

Mp K A " N kp V logp , 



(A7) 



(A8) 
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3. Efficiency 



Bringing together Eqs.(A2l, (A6l and (A8), we obtain 



the following expression for the computational efficiency 
in terms of p and Pb- 



(2kplogP B \ogp 2 ) ■ [(p - l)(logP s - logp) 
•(logP B logp(5fc(l-3p)-2i?) 
+SklogP B 2 (p - 1) + 2Skplogp 2 )}- 1 



(A9) 



Expression ( A9 ) was used to generate the data shown in 
Figure [2] 
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